QuicK-means: accelerating inference for K-means by learning fast transforms

نویسندگان

چکیده

K-means—and the celebrated Lloyd’s algorithm—is more than clustering method it was originally designed to be. It has indeed proven pivotal help increase speed of many machine learning, data analysis techniques such as indexing, nearest-neighbor search and prediction, compression and, lately, inference with kernel machines. Here, we introduce an efficient extension K-means, dubbed QuicK-means, that rests on idea expressing matrix $$K$$ cluster centroids a product sparse matrices, feat made possible by recent results devoted find approximations matrices factors. Using decomposition squashes complexity matrix-vector between factorized $$K\times D$$ centroid $${\mathbf {U}}$$ any vector from $${\mathcal {O}}\left( KD\right)$$ A \log B~ +B\right)$$ , $$A=\min \left( K,D\right)$$ $$B=\max where $$D$$ is dimension data. This drastic computational saving direct impact in assignment process point cluster. We propose learn factorization during training procedure. show resorting step at each iteration does not impair convergence optimization scheme, demonstrate benefits our approach experimentally.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Lloyd’s Algorithm for k-Means Clustering

The k-means clustering algorithm, a staple of data mining and unsupervised learning, is popular because it is simple to implement, fast, easily parallelized, and offers intuitive results. Lloyd’s algorithm is the standard batch, hill-climbing approach for minimizing the k-means optimization criterion. It spends a vast majority of its time computing distances between each of the k cluster center...

متن کامل

Fast k-means algorithm clustering

k-means has recently been recognized as one of the best algorithms for clustering unsupervised data. Since k-means depends mainly on distance calculation between all data points and the centers, the time cost will be high when the size of the dataset is large (for example more than 500millions of points). We propose a two stage algorithm to reduce the time cost of distance calculation for huge ...

متن کامل

Genetic Algorithm Based K-Means Fast Learning Artificial Neural Network

The K-means Fast Learning Artificial Neural Network (KFLANN) is a small neural network bearing two types of parameters, the tolerance, δ and the vigilance, μ. In previous papers, it was shown that the KFLANN was capable of fast and accurate assimilation of data [12]. However, it was still an unsolved issue to determine the suitable values for δ and μ in [12]. This paper continues to follows-up ...

متن کامل

Learning the k in k-means

When clustering a dataset, the right number k of clusters to use is often not obvious, and choosing k automatically is a hard algorithmic problem. In this paper we present an improved algorithm for learning k while clustering. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs k-means with increasing k in a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Machine Learning

سال: 2021

ISSN: ['0885-6125', '1573-0565']

DOI: https://doi.org/10.1007/s10994-021-05965-0